Daniel MARTIN

An extra-ordinary DBMS

Relational DataBase Management Systems (RDBMSs) currently dominate the database industry. They are the best tools for managing data in a computerized world that is still largely devoted to simple transactional processing, whether client-server or not. SQL's ability to answer all questions on the value of information in table models gives RDBMSs an advantage for decision support applications. However, a new type of DBMS has come along today that will force its way into the market -- the High Performance Object-Oriented DBMS (HPOD).

This type of DBMS answers the needs for which traditional RDBMSs are ill-suited :

complex structure data, which table model cannot represent, which SQL will only be able to manipulate at the SQL3 level a few years from now, which it will not even be able to define by then. This applies for example to time series, geographic maps, plans in CAD/CAM and technical documents combining text and photographs.
large size data sets, such as a multi-Kbyte, VGA format photograph (after compression by the JPEG algorithm, 100 times greater than the average table line for RDBMSs) and
continuous read & write of objects, such as image extraction done fast enough to smoothly support animated videos for hundreds of users simultaneously.

On the other hand, Object-oriented DBMSs (ODBMS) are well suited for objects with complex and large structures, but their performance remains modest and today they cannot reliably support round the clock multi-user environements.

Nonetheless, the need for this type of large-scale data manipulation does exist today. It is also growing, as individual and work-group productivity increasingly relies on intuitive data (that can be interpreted) -- users are no longer satisfied with text and figures, cumbersome to read and interpretate. Many professional activities now require curves, plans and schemas, sound, and complex objects combining all of these elements. We are heading towards a computer age in which decision and communication support are not separable from the automation of repetitive tasks and where a DBMS is required that can store all of this data and can serve all of the users in one work-group. The Database should be able to handle geographic data and instantly answer questions like " Which stock exchange activities show a significant variation in the price curve linked to a funcion of time (rising more than 10 %, then droping again) in 1994? " " Which trips to Africa provide the opportunity to see wild animals while staying in a nice hotel ? " If the answer includes a map, a photo or a 100-item list, the system needs to find and post them in within one second, all while performing other requests for the other users .

Meeting such needs calls for all of the power of the ODBMS, the research capacity of an RDBMS all along with a performance that is ten to one hundred times greater than theirs for the manipulation of complex and large objects; in other words, a High Performance Object-oriented DBMS (HPOD). Today such an HPOD exists : its name is Matisse.

Developped by a Franco-American team, this DBMS is currently installed on 2,300 machines worldwide and its sales are rapidly increasing. The database server version is available on UNIX (Sun, HP, Digital, and IBM) and under Windows NT. The database client runs under the same machines , plus under Silicon Graphics, AIX, Windows and Macintosh operating systems. Communications between the clients and the servers use RPCs. Matisse has caught the attention of the Gartner Group analysts who ranked the product among the " Visionaries with the greatest chances to succeed "

A real ODBMS

Matisse is first and foremost a true ODBMS -- it manages objects with single idenfiers (object ids), which can associate all sorts of data and include other unlimited objects. It withstands notions of class and version. Traditional ODBMSs don't have data dictionaries and force each application to describe, in hard coding and in its source code, the structure of the object (properties, composite links and inheritance) and the manipulation logic for these objects. Matisse, on the other hand, has a dictionary that accepts all of these descriptions. It gives applications a great deal more independence from complex object structure. One can let this structure evolve without impacting the application, which represents a great advantage.

When the complexity of the objects calls for the use of functions to describe their content, Matisse enables the writing of these functions and their integration into the DBMS engine. Thus one can analyse the description of a trip to extract relevant properties, whether they come from pictures or text. These properties allow one to index the object for later research. This indexing process is not attached to columns, as is the case with RDBMSs, but rather created by an ad hoc algorithm. This is what allows for automatic definition of a stock fluctuation curve and pinpointing the stocks which follow a certain type of variance pattern. The Matisse engine allows for recognition of a structure, which cannot happen with an RDBMS, and only possible with an ordinary ODBMS at the level of the given application.

In order to offer the same type of flexibility in the evolution of objects and links as an ODBMS, Matisse allows for dynamic modification of its dictionary. So that such changes do not block use, as would be the case with an RDBMS, Matisse manages versions of its dictionary just as it does manage versions of its basic object. This allows the user to work at all times on a dictionary and on objects coherent among themselves at given version levels.

All models of data types

The ease with which the Matisse dictionary supports every conceivable linking structure has another interesting effect -- this HPOD can support all data models. Thus one can describe and manipulate CODASYL networks such as IDS and IDMS, hierarchies such as DL/1, relational tables, entity-relationship models and so on. At any time, for performance reasons, Matisse can manage true connections through direct and inverse pointers -- this is a mandatory feature in a hierarchy or network model and renders the joins in a table-based model useless and pointers no longer an absolute necessity. In a few months, Matisse will also offer entry level SQL2 support, which will allow for the use of pointers and provide all of the flexibility and power of the relational approach. Thus the user will be able to choose the functional mode best adapted to each part of the database.

Access to and connection among the data are also made possible thanks to indexes with column values, as in a classical DBMS, or values created by various functions on the objects, which is even more powerful. The hashing indexing function in particular is interesting as it is highly resistant to homonyms and extremely powerful when one knows the exact value of an access key. Thanks to the diversity of its data access mechanisms, Matisse allows one to find the optimal compromise between performance and flexible evolution for each version of the dictionary -- the links and the index can come and go at will, from one version or the other without impacting applications. Using the power of the successive versions, Matisse allows for both performance and flexibility of evolution -- links and indexes can be created and removed at will, from one version to the next without impacting applications.

Objects stored in a Matisse database can change size when being updated. Size variations can be so large that it would be difficult to manage them by writing the updated object over the old one, in which case, one would have to manage the holes and the overflows. In other words, when an object is updated, its version changes. The new version thus coexists with the old one in the database. Every now and then, one shall do a global delete of the obsolete versions. This approach has a number of advantages: it allows for the writing of the new version sequentially all at once rather than fragmenting it. It's faster and so will be the next " read ". Also, one has two versions of the object, which makes database consistency maintenance a lot easier. Finally, there is no longer a call for journalisation. The " before " image of an object is the old one, the " after " the new one. For interrupted transactions, one simply cancels the objects of the abandoned version. The problem of locking during reading also disappears, one only reads the " old " version, in general the latest one. As with RDBMSs, one knows how to use grouped writing (Commit Group) to expedite the update process. As with other DBMSs, Matisse saves on disk access time by using a cache memory. Unlike other DBMSs, warm restart of Matisse is fast because the logs are replaced by versions: in case of failure, Matisse simply cancels the incomplete transactions and recovers unneeded disk space.

Another of Matisse's original features is the way it writes information on the disk. The writing mechanism automatically groups objects written at the same time and an algorithm takes into account the objects read at the same time to regroup them on the disk for later simultaneous readings. It is all as if Matisse did intelligent and dynamic " clustering " (in the Oracle sense of the term), which has excellent effects on performance.

Outstanding performance

As far as performance is concerned, Matisse automatically balances out the various disk subsystems to shorten the input/output (I/O) queues. No debate is required as to where to place tables and indexes to tune the server, how much room to allocate to storage space and extensions. This balancing act can benefit from the redundant writing abilities of the DBMS, which knows how to manage mirroring type attributes to increase availability and the performance during reading. Of course, Matisse knows how to benefit from SMP (symmetrical) or MPP (massively parallel) multiprocessor architectures and from multithreading to increase its ouput. It is very fast, even faster during heavy duty transactional processing, at a given CPU power, than an RDBMS. A TPC-C transactional benchmark, currently being audited, allows users to exceed 800 TPM-C on a SPARC 20 monoprocessor server (75 Mhz, 125 SPECint). Overall, this speed is due to Matisse's advantages over RDBMs - fewer and a finer granularity of locks (one can lock a class, an object or even a sub-object), no journal requirement, load-balancing of disk I/Os, pointers used rather than joints and better grouping of data manipulated as a whole.

As far as object management goes, a demo installed in my lab, on a small SPARC 10 server running Solaris and a Pentium 90 client under NT, revealed that Matisse can locate and extract per second dozens of images accompanied by traditional data. No RDBMS can go that fast. Display on the PC of extracted images is slowed down by the Ethernet output and decompression time and the graphic card, even though it was a 64-bit Matrox. Delayed or not, the images appear at 10 or more per second, while the procesor was loaded at only a few percents of capacity.

As for installation and adminsitration, Matisse is about as simple as a workgroup DBMS, in other words a great deal simpler than an industrial RDBMS. Complete installation of the server uder UNIX and of the client under NT takes about three to four hours. The entire performance optimization process consists of setting a half-dozen dynamic parameters, and the system supervision with a graphic tool takes tenfold less time than that required with a RDBMS.

A DBMS for tomorrow's I.T.

Should one think of Matisse as a replacement for an RDBMS? Both approaches offer very good transactional services, excellent performance that is extensible for (small) transactions and sufficient robustness to support continuous operation. But Matisse is currently not interfaced with management application development tools , comparable to those available with RDBMSs. Similarly, RDBMSs do not have object services comparable to those of Matisse or sufficient performance levels to support these services. Today, Matisse should be considered in the fields of applications with complex objects, very large databases (up to 4 billion objects), applications in which graphical objects and decision support go hand in hand with repetitive production tasks, large libraries, World Wide Web multimedia servers, telesales, applications that call for distinct data versions, and generally speaking, environments where productivity requires high performance object handling.

http://www.adb.com/

If you have any problems using this web site, please email webmaster@adb.com.